SOC2069 Quantitative Methods
  • Materials
  • Data
  • Canvas
  1. Week 4
  2. [W4] Worksheet
  • Outline and materials

  • Week 1
    • Introduction
  • Week 2
    • [W2] Slides and Notes
    • [W2] Worksheet
  • Week 3
    • [W3] Slides and Notes
    • [W3] Worksheet
  • Week 4
    • [W4] Slides and Notes
    • [W4] Worksheet
  • Week 5
    • [W5] Slides and Notes
    • [W5] Worksheet
  • Week 6
    • [W6] Slides and Notes
    • [W6] Worksheet

On this page

  • Week 4 Worksheet
    • Learning outcomes
    • Intro
    • Exercise 1: From a regression line to regression coefficients
      • Task 1.1: Visualise the relationship
      • Task 1.2: Model the relationship
      • Task 1.3: Interpret the regression model output
      • Task 1.4: Find the correlation coefficient using a “correlation” test instead
    • Exercise 2: Linear regression with categorical predictors
      • Task 2.1: Describe the Region variable using a Frequency table
      • Task 2.2: Build a simple bivariate regression model
    • Exercise 3: Build a multiple regression model

Week 4 Worksheet

Learning outcomes

By the end of the session, you should be familiar with:

  • running simple and multiple linear regression in JASP
  • performing a correlation analysis in JASP
  • model building in JASP
  • the interpretation of linear regression coefficients

Intro

We continue where we left off last week, taking further Week 3 Worksheet - Exercise 2 in which we made a scatter plot of inequality by social trust using the Trust & Inequality (trust_inequality.dta) dataset, which can be downloaded from https://cgmoreh.github.io/SOC2069-QUANT/Data/.

In that exercise we simplified the default output by removing the univariate distributions of the variables displayed on the margins and the regression line cutting through the plot. Now, however, we will focus on understanding what that “regression line” is actually telling us.

In later exercises we apply the same techniques to replicate a small part of the regression model reported in Österman (2021) (specifically, model (1) in the summary Table 3, which is presented in more detail in Table A.3 in the Online Supplementary Material accompanying the article)

Finally - probably outside class - you should practice the same linear regression modelling techniques on one of the assignment datasets and research questions.

Exercise 1: From a regression line to regression coefficients

If you haven’t yet downloaded it last week, download the Trust & Inequality (trust_inequality.dta) dataset from https://cgmoreh.github.io/SOC2069-QUANT/Data/

Task 1.1: Visualise the relationship

As a first step, create a scatter plot visualising the “relationship” (co-variation, joint distribution, …) between social trust (trust_pct) and inequality (inequality_s80s20). This is Exercise 2 from Week 3 - if you need a reminder of how to do it, check Week 3 Worksheet - Exercise 2 or your saved .jasp file containing your workshop analysis from Week 3.

Task 1.2: Model the relationship

Now let’s dig deeper into the meaning of the regression line by building a simple bivariate linear regression model of social trust as a function of societal inequality (i.e. a model aiming to explain/predict values of social trust in various countries depending on the value of societal inequality in those countries).

To build a linear regression model in JASP, click through the Menu tabs:

\[ \text{Regression} \longrightarrow \text{[Classical] Linear regression} \] In the Linear regression panel, move the “social trust” variable to the \(\text{Dependent Variable}\) box and the “inequality” variable to the \(\text{Covariates}\) box.

The results from the linear regression model will appear in the outputs window on the right.

Task 1.3: Interpret the regression model output

Questions

  • Using the lecture slides and Chapter 7 (“Linear regression with a single predictor”) from the Introduction to Modern Statistics (IMS), interpret the meaning of the regression coefficient on “inequality”.
  • Add a note on the JASP output under the \(\text{Coefficients}\) output and write down your interpretation there. [Tip: You’ve already practiced adding notes to the outputs in Week 2, Exercise 3, Point 7]
  • Where can you find the coefficient of correlation (\(R\)) in the outputs? What about the coefficient of determination (\(R^2\))?

Task 1.4: Find the correlation coefficient using a “correlation” test instead

To run a simple bivariate correlation analysis in JASP, go through the Menu tabs:

\[ \text{Regression} \longrightarrow \text{[Classical] Correlation} \] Move both of the variables of interest to the \(\text{Variables}\) box.

Check if the results are the same as those obtained using linear regression

Exercise 2: Linear regression with categorical predictors

Now we will build another simple bivariate regression model, but this time we will use the variable Region to model/explain/predict levels of “social trust” in different countries. Region is the only Nominal categorical variable in this dataset, and categorical variables behave differently in regression models.

Task 2.1: Describe the Region variable using a Frequency table

Tip: You have done this a few times in previous workshops. Check back on previous exercises if you need to remind yourself of how to create a frequency table.

Task 2.2: Build a simple bivariate regression model

The steps for fitting the regression, however, are very similar to what we have done in the previous exercise:

  • Click through the Menu tabs:

\[ \text{Regression} \longrightarrow \text{[Classical] Linear regression} \]

  • In the Linear regression panel, move the “social trust” variable to the \(\text{Dependent Variable}\) box
  • BUT THIS TIME, we will move the Region variable to the \(\text{Factors}\) box instead.

This will tell JASP that the Region variable is categorical and it should model it as such, treating each of its constituent categories as an individual factor/indicator variable, automatically leaving out the first category (Task 2.1 above will tell you which one that is!) from the model so that the left out category becomes the baseline/reference to which the coefficients on all the other categories compare. What happens here is that the left out category is absorbed into the “Intercept” (the unknown/unmeasured variation in the dependent variable).

The results from the linear regression model will appear in the outputs window on the right.

Questions

  • Using the lecture slides and the assigned readings from Introduction to Modern Statistics (IMS), interpret the meaning of the regression coefficients on each reported level of the Region variable;
  • Which one is the “reference”/“baseline” category?
  • Add a note on the JASP output under the \(\text{Coefficients}\) output and write down your interpretation there. [Tip: You’ve already practiced adding notes to the outputs in Week 2, Exercise 3, Point 7]
  • Where can you find the coefficient of correlation (\(R\)) in the outputs? What about the coefficient of determination (\(R^2\))? Are they meaningful in this context? Why so, or why not?

Exercise 3: Build a multiple regression model

We can now combine the separate bivariate analyses in the previous two exercises into a more elaborate multiple regression model. The procedure to build a multiple regression model is the same as in the simple regression models before, but this time we add both of the independent variables into the model:

\[ \text{Regression} \longrightarrow \text{[Classical] Linear regression} \]

  • In the Linear regression panel, move the “social trust” variable to the \(\text{Dependent Variable}\) box
  • Move the “inequality” variable to the \(\text{Covariates}\) box
  • Move the Region variable to the \(\text{Factors}\) box

The results will appear in the outputs window on the right. We now have a statistical model which explains variation in “social trust” not only dependent on “inequality”, but also on “Region”. Put differently - if our main aim is to estimate how “inequality” is associated with “social trust” - we have obtained a more accurate estimate of the association between “inequality” and “social trust”, while also accounting for variation due to differences in the Region to which countries belong.

Questions

  • Using the lecture slides and the assigned readings from Introduction to Modern Statistics (IMS), interpret the meaning of each regression coefficient, comparing them with the ones obtained from the simpler models in the previous exercises;
  • Add a note on the JASP output under the \(\text{Coefficients}\) output and write down your interpretations there.
  • Where can you find the coefficient of correlation (\(R\)) in the outputs? What about the coefficient of determination (\(R^2\))? Are they meaningful in this context? Why so, or why not?

References

Österman M (2021) Can we trust education for fostering trust? Quasi-experimental evidence on the effect of education and tracking on social trust. Social Indicators Research 154(1): 211–233.
[W4] Slides and Notes
[W5] Slides and Notes